3D Computer Vision (2024/25)¶
Exercise¶
Upload: 30.10.2024 (11:30)
Deadline: 16.01.2025 (23:59)
Group 07¶
- Daiana Tei
- Ahsan Aftab
- Dhruv Kale
By submitting this exercise, we confirm the following:
- All people listed above contributed to this solution
- No other people were involved in this solution
- No contents of this solution were copied from others (this includes people, large language models, websites, etc.)
3D Scene Reconstruction¶
Task Overview¶
Your task in this exercise is to do a dense reconstruction of a scene. This will involve multiple steps that you will encounter and learn about as the semester progresses. You can start implementing individual steps as soon as you learn about them or wait until you have learned more to implement everything together. In the latter case, be mindful that this exercise is designed for an entire semester and the workload is accordingly large.
You will be given the following data:
- 9 color images of the scene.
- 8 Bit RGB per pixel.
- Each image rendered from a different position.
- The camera used had lens distortion.
- 9 Depth images of the scene.
- 8 Bit Grayscale per pixel. The result of dividing the Z-depth by each image's maximum and then multiplying by 255.
- Each image has the same pose as the corresponding RGB image.
- The camera used was free of any distortions.
- 1 Dictionary containing camera calibration parameters.
- They belong to the camera that was used to render the RGB images.
- Distortion coefficients are given in the standard [k1, k2, p1, p2, k3] order.
- 1 Numpy array containing 8 camera transformations.
- They specify the movements that the camera went through to render all images.
- I.e. idx 0 specifies the transformation from 00.png to 01.png, idx 1 specifies the transformation from 01.png to 02.png, ...
- This applies to both RGB and Depth images, as they have the same poses.
- 1 Numpy array containing 7 features.
- The features are specified for each of the 9 images.
- Each feature is a 2D pixel location in "H, W" order, meaning the first value is the height/row in the image and the second width/column.
- If a feature was not visible, it was entered as [-1, -1].
- The features are unsorted, meaning that feature idx 0 for 00.png could be corresponding to e.g. feature idx 4 for 01.png.
Solution requirements¶
- Your code needs to compile, run, and produce an output.
- Your target output should be a dense point cloud reconstruction (without holes) of the scene.
- The output should be in the .ply format. We provide a function that can exports a .ply file.
- You may inspect your .ply outputs in e.g. Meshlab.
- See the 'Dense Point Cloud' sample image to get an idea of what is possible. (Meshlab screenshot with point shading set to None)
- Your code should be a general solution.
- This means that it could run correctly for a different dataset (with same input structure).
- It should NOT include anything hardcoded specific to this dataset.
- Your code should not be unnecessarily inefficient.
- Our sample solution runs in less than 2 minutes total (including point cloud export).
- If your solution runs for more than 10 minutes, you are being wasteful in some part of your program.
Environment setup¶
!python --version
Python 3.11.7
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install scikit-learn
!pip3 install opencv-python
!pip install scipy
!pip install torch
!pip install torchvision
!pip3 install open3d-python
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.11/site-packages (1.26.4) Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.11/site-packages (2.1.4) Requirement already satisfied: numpy<2,>=1.23.2 in /opt/anaconda3/lib/python3.11/site-packages (from pandas) (1.26.4) Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/lib/python3.11/site-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/lib/python3.11/site-packages (from pandas) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in /opt/anaconda3/lib/python3.11/site-packages (from pandas) (2023.3) Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0) Requirement already satisfied: matplotlib in /opt/anaconda3/lib/python3.11/site-packages (3.8.0) Requirement already satisfied: contourpy>=1.0.1 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (1.2.0) Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (1.4.4) Requirement already satisfied: numpy<2,>=1.21 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (1.26.4) Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (23.1) Requirement already satisfied: pillow>=6.2.0 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (10.2.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in /opt/anaconda3/lib/python3.11/site-packages (from matplotlib) (2.8.2) Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0) Requirement already satisfied: scikit-learn in /opt/anaconda3/lib/python3.11/site-packages (1.2.2) Requirement already satisfied: numpy>=1.17.3 in /opt/anaconda3/lib/python3.11/site-packages (from scikit-learn) (1.26.4) Requirement already satisfied: scipy>=1.3.2 in /opt/anaconda3/lib/python3.11/site-packages (from scikit-learn) (1.11.4) Requirement already satisfied: joblib>=1.1.1 in /opt/anaconda3/lib/python3.11/site-packages (from scikit-learn) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/anaconda3/lib/python3.11/site-packages (from scikit-learn) (2.2.0) Requirement already satisfied: opencv-python in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (4.11.0.86) Requirement already satisfied: numpy>=1.21.2 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from opencv-python) (2.2.1) [notice] A new release of pip is available: 23.2.1 -> 24.3.1 [notice] To update, run: python3.11 -m pip install --upgrade pip Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.11/site-packages (1.11.4) Requirement already satisfied: numpy<1.28.0,>=1.21.6 in /opt/anaconda3/lib/python3.11/site-packages (from scipy) (1.26.4) Requirement already satisfied: torch in /opt/anaconda3/lib/python3.11/site-packages (2.3.1) Requirement already satisfied: filelock in /opt/anaconda3/lib/python3.11/site-packages (from torch) (3.13.1) Requirement already satisfied: typing-extensions>=4.8.0 in /opt/anaconda3/lib/python3.11/site-packages (from torch) (4.9.0) Requirement already satisfied: sympy in /opt/anaconda3/lib/python3.11/site-packages (from torch) (1.12) Requirement already satisfied: networkx in /opt/anaconda3/lib/python3.11/site-packages (from torch) (3.1) Requirement already satisfied: jinja2 in /opt/anaconda3/lib/python3.11/site-packages (from torch) (3.1.3) Requirement already satisfied: fsspec in /opt/anaconda3/lib/python3.11/site-packages (from torch) (2023.10.0) Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/lib/python3.11/site-packages (from jinja2->torch) (2.1.3) Requirement already satisfied: mpmath>=0.19 in /opt/anaconda3/lib/python3.11/site-packages (from sympy->torch) (1.3.0)
Imports¶
Please note the following:
- These are all imports necessary to achieve the sample results.
- You may remove and/or add other libraries at your own convinience.
- Using library functions (from the given or other libraries) that bypass necessary computer vision tasks will not be recognized as 'solved'.
- E.g.: If you need to undistort an image to get to the next step of the solution and use the library function cv2.undistort(), then we will evaluate the undistortion step as 'failed'.
- E.g.: If you want to draw points in an image (to check your method or visualize in-between steps) and use the library function cv2.circle(), then there is no problem.
- E.g.: If you need to perform complex mathematical operations and use some numpy function, then there is no problem.
- E.g.: You do not like a provided utility function and find/know a library function that gives the same outputs from the same inputs, then there is no problem.
import os
os.sys.path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import cv2
import scipy
import torch
import torchvision
# !pip install open3d
import open3d as o3d
Data¶
This should load all available data and also create some output directories. Feel free to rename variables or add additional directories as you see fit.
#Inputs
base_path = os.getcwd()
data_path = os.path.join(base_path, f"data")
img_path = os.path.join(data_path, 'images')
depth_path = os.path.join(data_path, 'depths')
print(f"The project's root path is '{base_path}'.")
print(f"Reading data from '{data_path}'.")
print(f"Image folder: '{img_path}'.")
print(f"Depth folder: '{depth_path}'.")
#Outputs
out_path = os.path.join(base_path, 'output')
ply_path = os.path.join(out_path, 'point_cloud')
os.makedirs(out_path, exist_ok=True)
os.makedirs(ply_path, exist_ok=True)
print(f"\nCreating directory '{out_path}'.")
print(f"Creating directory '{ply_path}'.")
#Load Data
camera_calibration = np.load(os.path.join(data_path, 'camera_calibration.npy'), allow_pickle=True)
camera_calibration = camera_calibration.item()#get dictionary from numpy array struct
given_features = np.load(os.path.join(data_path, 'given_features.npy'), allow_pickle=True)
camera_movement = np.load(os.path.join(data_path, 'camera_movement.npy'), allow_pickle=True)
print(f"\nCamera Calibration:")
for entry in camera_calibration.items():
print(f" {entry[0]}: {entry[1]}")
print(f"Camera Movement: {camera_movement.shape}")#[Cameras-1, 4, 4]
print(f"2D Features (Unsorted): {given_features.shape}")#[Camera_idx, Feature_idx, 2]
The project's root path is '/Users/ahsanaftab/ERIKO/3D-Scene-Reconstruction'. Reading data from '/Users/ahsanaftab/ERIKO/3D-Scene-Reconstruction/data'. Image folder: '/Users/ahsanaftab/ERIKO/3D-Scene-Reconstruction/data/images'. Depth folder: '/Users/ahsanaftab/ERIKO/3D-Scene-Reconstruction/data/depths'. Creating directory '/Users/ahsanaftab/ERIKO/3D-Scene-Reconstruction/output'. Creating directory '/Users/ahsanaftab/ERIKO/3D-Scene-Reconstruction/output/point_cloud'. Camera Calibration: distortion_param: [-0.1, 0.02, 0.0, 0.0, -0.01] image_height: 551 image_width: 881 principal_point: [275.0, 440.0] focal_length_mm: 25 sensor_width_mm: 35 pixel_ratio: 1.0 pixel_per_mm: 25.17142857142857 focal_length_px: 629.2857142857142 Camera Movement: (8, 4, 4) 2D Features (Unsorted): (9, 7, 2)
import glob
rgb_paths = sorted(glob.glob("data/images/*.png"))
depth_paths = sorted(glob.glob("data/depths/*.png"))
feature_data = np.load("data/given_features.npy")
output_folder = "output"
for i in range(len(rgb_paths)):
rgb_image = cv2.imread(rgb_paths[i])
depth_map = cv2.imread(depth_paths[i], cv2.IMREAD_GRAYSCALE)
features = feature_data[i]
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.imshow(cv2.cvtColor(rgb_image, cv2.COLOR_BGR2RGB))
plt.title(f"RGB Image {i}")
plt.subplot(1, 3, 2)
plt.imshow(depth_map, cmap="jet_r")
plt.colorbar()
plt.title(f"Depth Map {i}")
plt.subplot(1, 3, 3)
plt.imshow(cv2.cvtColor(rgb_image, cv2.COLOR_BGR2RGB))
for feature in features:
if feature[0] != -1: # invalid features: if feature is not visible in image, coordinates are [-1, -1]
plt.plot(feature[1], feature[0], 'ro', markersize=2)
plt.title(f"Features {i}")
plt.tight_layout()
plt.show()
Undistortion¶
focal_length_px = camera_calibration['focal_length_px']
principal_point = camera_calibration['principal_point']
coeffs = camera_calibration['distortion_param']
K = np.array([
[focal_length_px, 0, principal_point[1]],
[0, focal_length_px, principal_point[0]],
[0, 0, 1]
], dtype=np.float64)
print("Camera matrix K:")
print(K)
print("\nDistortion coefficients:")
print(coeffs)
Camera matrix K: [[629.28571429 0. 440. ] [ 0. 629.28571429 275. ] [ 0. 0. 1. ]] Distortion coefficients: [-0.1, 0.02, 0.0, 0.0, -0.01]
def undistort_image_inverse_sampling(image, file_name, K, distortion_params):
h, w = image.shape[:2]
f_x, f_y = K[0, 0], K[1, 1]
u_0, v_0 = K[0, 2], K[1, 2]
# empty output
undistorted = np.zeros_like(image)
# grid of undistorted coordinates
u_grid, v_grid = np.meshgrid(np.arange(w), np.arange(h))
undistorted_coords = np.stack([u_grid.ravel(), v_grid.ravel()], axis=-1)
distorted_coords = []
# for each coordinate, calculate corresponding distorted coordinates
for u, v in undistorted_coords:
# normalisation
x = (u - u_0) / f_x
y = (v - v_0) / f_y
# invert distortion model
x_dist, y_dist = x, y
r2 = x_dist**2 + y_dist**2
radial_distortion = 1 + distortion_params[0]*r2 + distortion_params[1]*r2**2 + distortion_params[4]*r2**3
tangential_x = 2 * distortion_params[2] * x_dist * y_dist + distortion_params[3] * (r2 + 2 * x_dist**2)
tangential_y = distortion_params[2] * (r2 + 2 * y_dist**2) + 2 * distortion_params[3] * x_dist * y_dist
x_dist = x * radial_distortion + tangential_x
y_dist = y * radial_distortion + tangential_y
# map distorted normalized coordinates back to pixels
u_dist = f_x * x_dist + u_0
v_dist = f_y * y_dist + v_0
distorted_coords.append((u_dist, v_dist))
distorted_coords = np.array(distorted_coords).reshape(h, w, 2)
# bilinear interpolation to sample distorted image
map_x = distorted_coords[:, :, 0].astype(np.float32)
map_y = distorted_coords[:, :, 1].astype(np.float32)
undistorted = cv2.remap(image, map_x, map_y, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
os.makedirs(output_folder, exist_ok=True)
output_path = os.path.join(output_folder, file_name)
cv2.imwrite(output_path, undistorted)
print(f"Saved undistorted image to {output_path}")
return undistorted
for i in range(len(rgb_paths)):
distorted_image = cv2.imread(rgb_paths[i])
undistorted_image = undistort_image_inverse_sampling(distorted_image, f"{i}.png", K, coeffs)
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(distorted_image, cv2.COLOR_BGR2RGB))
plt.title("Distorted Image")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(undistorted_image, cv2.COLOR_BGR2RGB))
plt.title("Undistorted Image")
plt.show()
undistorted_images = sorted(glob.glob(output_folder+"/*.png"))
Saved undistorted image to output/0.png
Saved undistorted image to output/1.png
Saved undistorted image to output/2.png
Saved undistorted image to output/3.png
Saved undistorted image to output/4.png
Saved undistorted image to output/5.png
Saved undistorted image to output/6.png
Saved undistorted image to output/7.png
Saved undistorted image to output/8.png
Matching features¶
Data observation¶
print(feature_data)
[[[163 616] [431 593] [380 672] [164 660] [378 462] [280 422] [274 650]] [[192 722] [183 694] [398 444] [495 651] [495 446] [346 704] [282 376]] [[252 536] [384 621] [291 480] [157 501] [ -1 -1] [ -1 -1] [157 540]] [[ -1 -1] [199 614] [ -1 -1] [307 657] [195 658] [409 690] [285 422]] [[168 693] [474 681] [ -1 -1] [285 374] [157 722] [322 722] [473 446]] [[285 481] [207 500] [ -1 -1] [ -1 -1] [207 539] [ -1 -1] [303 539]] [[187 650] [448 574] [284 636] [386 649] [313 424] [177 610] [383 460]] [[ -1 -1] [ -1 -1] [ -1 -1] [ -1 -1] [444 367] [ -1 -1] [505 444]] [[472 485] [350 602] [473 631] [259 554] [ -1 -1] [ -1 -1] [ -1 -1]]]
for i in range(len(undistorted_images)):
rgb_image = cv2.imread(undistorted_images[i])
features = feature_data[i]
plt.imshow(cv2.cvtColor(rgb_image, cv2.COLOR_BGR2RGB))
for feature in features:
if feature[0] != -1: # invalid features: if feature is not visible in image, coordinates are [-1, -1]
plt.plot(feature[1], feature[0], 'ro', markersize=2)
plt.title(f"Features {i}")
plt.tight_layout()
plt.show()
print(camera_movement)
[[[ 7.07106781e-01 -1.83012702e-01 6.83012702e-01 -3.00000000e+00] [ 1.83012702e-01 9.80379875e-01 7.32233047e-02 -2.58819045e-01] [-6.83012702e-01 7.32233047e-02 7.26726907e-01 9.65925826e-01] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 6.12323400e-17 2.58819045e-01 -9.65925826e-01 4.00000000e+00] [-2.58819045e-01 9.33012702e-01 2.50000000e-01 -1.03527618e+00] [ 9.65925826e-01 2.50000000e-01 6.69872981e-02 3.86370331e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -1.83012702e-01 6.83012702e-01 -2.82842712e+00] [ 1.29893408e-16 9.65925826e-01 2.58819045e-01 -1.00000000e+00] [-7.07106781e-01 -1.83012702e-01 6.83012702e-01 1.41421356e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -1.29893408e-16 7.07106781e-01 -3.00000000e+00] [ 1.29893408e-16 1.00000000e+00 5.38036114e-17 -2.22044605e-16] [-7.07106781e-01 5.38036114e-17 7.07106781e-01 1.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 6.12323400e-17 1.83697020e-16 -1.00000000e+00 4.00000000e+00] [-1.83697020e-16 1.00000000e+00 1.83697020e-16 -7.77156117e-16] [ 1.00000000e+00 1.83697020e-16 6.12323400e-17 4.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -2.08398031e-16 7.07106781e-01 -2.82842712e+00] [ 2.98836239e-01 9.06307787e-01 -2.98836239e-01 1.21494310e+00] [-6.40856382e-01 4.22618262e-01 6.40856382e-01 2.12694929e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 7.07106781e-01 -2.98836239e-01 6.40856382e-01 -3.00000000e+00] [ 5.00000000e-01 8.52165513e-01 -1.54317655e-01 1.41421356e+00] [-5.00000000e-01 4.29547251e-01 7.51990132e-01 0.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] [[ 6.12323400e-17 7.07106781e-01 -7.07106781e-01 2.00000000e+00] [-7.07106781e-01 5.00000000e-01 5.00000000e-01 -1.41421356e+00] [ 7.07106781e-01 5.00000000e-01 5.00000000e-01 1.41421356e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]]
Camera transformations visualisation¶
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def look_at(eye, target, up):
eye = np.array(eye, dtype=np.float64)
target = np.array(target, dtype=np.float64)
up = np.array(up, dtype=np.float64)
forward = target - eye
forward /= np.linalg.norm(forward)
right = np.cross(up, forward)
right /= np.linalg.norm(right)
up = np.cross(forward, right)
rotation_matrix = np.eye(4)
rotation_matrix[:3, 0] = right
rotation_matrix[:3, 1] = up
rotation_matrix[:3, 2] = -forward
translation_matrix = np.eye(4)
translation_matrix[:3, 3] = -eye
return rotation_matrix @ translation_matrix
# Initial position
eye = [0, 0, 0]
target = [1, 1, 0]
up = [0, 0, 1]
initial_pose = look_at(eye, target, up)
camera_poses = [initial_pose]
current_pose = initial_pose.copy()
for transform in camera_movement:
current_pose = current_pose @ transform
camera_poses.append(current_pose)
camera_positions = np.array([pose[:3, 3] for pose in camera_poses])
camera_rotations = np.array([pose[:3, :3] for pose in camera_poses])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(camera_positions[:, 0], camera_positions[:, 1], camera_positions[:, 2], c='r')
for i, (pose, R) in enumerate(zip(camera_positions, camera_rotations)):
x_axis = R[:, 0]
y_axis = R[:, 1]
z_axis = R[:, 2]
start = pose
length = 0.5
ax.quiver(start[0], start[1], start[2], x_axis[0], x_axis[1], x_axis[2], color='r', length=length)
ax.quiver(start[0], start[1], start[2], y_axis[0], y_axis[1], y_axis[2], color='g', length=length)
ax.quiver(start[0], start[1], start[2], z_axis[0], z_axis[1], z_axis[2], color='b', length=length)
ax.text(start[0], start[1], start[2], f' {i}', None)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
Feature matching¶
Computing fundamental matrices¶
def get_essential_matrix(R, t):
"""
Computes essential matrix E (3x3) from rotation matrix (3x3) and translation vector (3,).
"""
# Skew-symmetric matrix for t
t_skew = np.array([
[0, -t[2], t[1]],
[t[2], 0, -t[0]],
[-t[1], t[0], 0]
])
# E = [t]_x R
E = t_skew @ R
return E
def get_fundamental_matrix(E, K):
"""
Computes fundamental matrix F (3x3) from essential matrix E (3x3) and camera matrix K (3x3).
"""
# F = K^-T E K^-1
F = np.linalg.inv(K).T @ E @ np.linalg.inv(K)
# Normalisation
F = F / F[2, 2]
return F
num_images = len(undistorted_images)
F = []
for i in range(num_images-1):
R = camera_movement[i][:3, :3]
t = camera_movement[i][:3, 3]
E = get_essential_matrix(R, t)
F.append(get_fundamental_matrix(E, K))
print(F)
[array([[-4.28363670e-22, 9.93913280e-06, -1.05735875e-03],
[ 1.40560564e-05, -4.41360389e-07, -2.44523175e-02],
[-1.49533109e-03, 1.50993082e-02, 1.00000000e+00]]), array([[ 6.96713717e-22, 1.06817758e-05, -1.13636364e-03],
[ 1.06817758e-05, -6.13874952e-22, 2.25903012e-03],
[-1.13636364e-03, -1.16589928e-02, 1.00000000e+00]]), array([[-6.56430105e-06, 1.09822897e-05, 5.99649959e-03],
[ 9.28332357e-06, 7.20810283e-06, -2.29953094e-02],
[-3.79544239e-03, 1.02149656e-02, 1.00000000e+00]]), array([[-1.03326556e-22, 3.81056112e-06, -1.04790431e-03],
[ 5.38894722e-06, -1.20099078e-22, -9.15351177e-03],
[-1.48196048e-03, 5.51714814e-03, 1.00000000e+00]]), array([[ 4.37686340e-23, 4.13223140e-06, -1.13636364e-03],
[ 4.13223140e-06, -0.00000000e+00, 7.82172373e-04],
[-1.13636364e-03, -4.41853601e-03, 1.00000000e+00]]), array([[ 3.16127319e-06, 3.16127319e-06, -4.24965438e-03],
[ 6.89910318e-07, -2.67202356e-06, -4.23413869e-03],
[ 8.16760564e-04, 2.94976229e-03, 1.00000000e+00]]), array([[-1.31878981e-05, 1.13296507e-05, 1.51684870e-02],
[-2.79757564e-05, 2.40338186e-05, 3.21772201e-02],
[-1.58452316e-02, -3.66386001e-02, 1.00000000e+00]]), array([[-4.57578416e-22, -3.20747801e-06, -1.13636364e-03],
[-3.20747801e-06, -2.51801851e-22, -1.44318674e-03],
[-1.13636364e-03, 4.26576739e-03, 1.00000000e+00]])]
Projecting epipolar lines¶
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k'] # Matplotlib color codes
for i in range(num_images-1):
img1 = cv2.cvtColor(cv2.imread(undistorted_images[i]), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.imread(undistorted_images[i+1]), cv2.COLOR_BGR2RGB)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
# Features in first image
axs[0].imshow(img1)
k = 0
for p1 in feature_data[i]:
if p1[0] == -1:
continue
axs[0].plot(p1[1], p1[0], colors[k] + 'o', markersize=3) # height is x, width is y
k += 1
axs[1].imshow(img2)
# Epipolar lines from features in second image using respective colours
k = 0
for p1 in feature_data[i]:
if p1[0] == -1:
continue
x1 = np.array([p1[1], p1[0], 1]) # homogeneous coordinates
l2 = F[i] @ x1 # epipolar line
l2 = [l2[0]/l2[2], l2[1]/l2[2], 1.0]
h, w = img2.shape[:2]
x0, y0 = 0, -l2[2] / l2[1]
x1, y1 = h, -(l2[2] + l2[0] * h) / l2[1]
axs[1].axline((x1, y1), (x0, y0), color=colors[k])
k += 1
# Features in second image
for p1 in feature_data[i+1]:
if p1[0] == -1:
continue
axs[1].plot(p1[1], p1[0], 'ro', markersize=3)
plt.show()
Finding correspondences between consecutive images¶
num_features = len(feature_data[0])
def match_feature_(feature, features_to_match, F, distance_threshold=2):
"""
Matches feature to that of another view.
"""
if feature[0] == -1:
return -1
x1 = np.array([feature[1], feature[0], 1])
epipolar_line = F @ x1
epipolar_line = [epipolar_line[0]/epipolar_line[2], epipolar_line[1]/epipolar_line[2], 1.0]
best_match = -1
min_distance = float('inf')
for l in range(len(features_to_match)):
if features_to_match[l, 0] != -1:
x2 = np.array([features_to_match[l, 1], features_to_match[l, 0], 1])
distance = np.abs(epipolar_line[0] * x2[0] + epipolar_line[1] * x2[1] + epipolar_line[2]) / np.sqrt(epipolar_line[0]**2 + epipolar_line[1]**2)
if distance < min_distance:
min_distance = distance
best_match = l
if min_distance < distance_threshold:
return best_match
return -1
matches = []
for i in range(num_images-1):
matches_i = []
for j in range(num_features):
match = match_feature_(feature_data[i, j], feature_data[i+1], F[i])
if match != -1:
matches_i.append((j, match))
matches.append(matches_i)
print(f"{i}->{i+1}", matches_i)
0->1 [(0, 1), (1, 4), (2, 3), (3, 0), (4, 2), (5, 6), (6, 5)] 1->2 [(0, 6), (1, 3), (4, 1), (5, 0), (6, 2)] 2->3 [(0, 3), (2, 6), (3, 1), (6, 4)] 3->4 [(1, 0), (3, 5), (4, 4), (5, 1), (6, 3)] 4->5 [(0, 1), (3, 0), (4, 4), (5, 6)] 5->6 [(0, 4), (1, 5), (4, 0), (6, 2)] 6->7 [(4, 4), (6, 6)] 7->8 [(4, 0)]
for i in range(num_images-1):
img1 = cv2.cvtColor(cv2.imread(undistorted_images[i]), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.imread(undistorted_images[i+1]), cv2.COLOR_BGR2RGB)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(img1)
for j, (match1, match2) in enumerate(matches[i]):
u1, v1 = feature_data[i, match1]
axs[0].plot(v1, u1, colors[j] + 'o', markersize=3) # Note: v1 is x, u1 is y for plotting
axs[1].imshow(img2)
for j, (match1, match2) in enumerate(matches[i]):
u2, v2 = feature_data[i+1, match2]
axs[1].plot(v2, u2, colors[j] + 'o', markersize=3) # Note: v2 is x, u2 is y
axs[0].set_title(f"Image {i}")
axs[1].set_title(f"Image {i+1}")
plt.tight_layout()
plt.show()
Visualising all correspondences¶
def get_transformation_matrices(view_idx):
"""
Calculates transformation matrices from the given view to all views
"""
pose = np.eye(4)
transformations = [pose]
for i in range(num_images-1 - view_idx):
pose = camera_movement[i+view_idx] @ pose
transformations.append(pose)
pose = np.eye(4)
for i in range(view_idx):
pose = np.linalg.inv(camera_movement[view_idx-1-i]) @ pose
transformations.insert(0, pose)
return transformations
def show_features_from_view(idx):
transformations = get_transformation_matrices(idx)
for i in range(num_images):
if i == idx:
continue
img1 = cv2.cvtColor(cv2.imread(undistorted_images[idx]), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.imread(undistorted_images[i]), cv2.COLOR_BGR2RGB)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(img1)
k = 0
for p1 in feature_data[idx]:
if p1[0] == -1:
continue
axs[0].plot(p1[1], p1[0], colors[k] + 'o', markersize=3) # height is x, width is y
k += 1
R = transformations[i][:3, :3]
t = transformations[i][:3, 3]
E = get_essential_matrix(R, t)
F = get_fundamental_matrix(E, K)
axs[1].imshow(img2)
k = 0
bad_pair = False
for p1 in feature_data[idx]:
if p1[0] == -1:
continue
x1 = np.array([p1[1], p1[0], 1])
l2 = F @ x1
l2 = [l2[0]/l2[2], l2[1]/l2[2], 1.0]
h, w = img2.shape[:2]
x0, y0 = 0, -l2[2] / l2[1]
x1, y1 = h, -(l2[2] + l2[0] * h) / l2[1]
axs[1].axline((x1, y1), (x0, y0), color=colors[k])
k += 1
if abs(y0) > (h+w) * 2 or abs(y1) > (h+w) * 2:
bad_pair = True
break
if bad_pair == True:
print(f"! ! ! BAD VIEWS between images {idx} and {i}")
continue
for p1 in feature_data[i]:
if p1[0] == -1:
continue
axs[1].plot(p1[1], p1[0], 'ro', markersize=3)
axs[0].set_title(f"Image {idx}")
axs[1].set_title(f"Image {i}")
plt.show()
for i in range(num_images):
show_features_from_view(i)
! ! ! BAD VIEWS between images 0 and 3
! ! ! BAD VIEWS between images 0 and 6
! ! ! BAD VIEWS between images 1 and 4
! ! ! BAD VIEWS between images 1 and 7
! ! ! BAD VIEWS between images 2 and 5
! ! ! BAD VIEWS between images 2 and 8 ! ! ! BAD VIEWS between images 3 and 0
! ! ! BAD VIEWS between images 3 and 6
! ! ! BAD VIEWS between images 4 and 1
! ! ! BAD VIEWS between images 4 and 7
! ! ! BAD VIEWS between images 5 and 2
! ! ! BAD VIEWS between images 5 and 8 ! ! ! BAD VIEWS between images 6 and 0
! ! ! BAD VIEWS between images 6 and 3
! ! ! BAD VIEWS between images 7 and 1
! ! ! BAD VIEWS between images 7 and 4
! ! ! BAD VIEWS between images 8 and 2
! ! ! BAD VIEWS between images 8 and 5
Building correspondence matrix¶
feature_correspondence_matrix = []
features_matched = []
for i in range(num_images):
feature = []
matches = []
for j in range(num_features):
feature.append([-1, -1])
matches.append(False)
feature_correspondence_matrix.append(feature)
features_matched.append(matches)
for i in range(num_features):
feature_correspondence_matrix[0][i] = feature_data[0][i]
features_matched[0][i] = True
h, w = img2.shape[:2]
ambig_param = 1.1
def match_feature(feature, features_to_match, F, distance_threshold=5):
"""
Matches feature to that of another view, or returns (1, [-1, -1]) if feature was not found, (0, [-1, -1]) and (0, [0, 0]) in case of bad view or ambiguity respectively.
"""
if feature[0] == -1:
return (1, [-1, -1])
x1 = np.array([feature[1], feature[0], 1])
epipolar_line = F @ x1
epipolar_line = [epipolar_line[0]/epipolar_line[2], epipolar_line[1]/epipolar_line[2], 1.0]
x0, y0 = 0, -epipolar_line[2] / epipolar_line[1]
x1, y1 = h, -(epipolar_line[2] + epipolar_line[0] * h) / epipolar_line[1]
if abs(y0) > (h+w) * 2 or abs(y1) > (h+w) * 2:
return (0, [-1, -1])
best_match = -1
min_distance = float('inf')
ambiguity = False
for l in range(len(features_to_match)):
if features_to_match[l, 0] != -1:
x2 = np.array([features_to_match[l, 1], features_to_match[l, 0], 1])
distance = np.abs(epipolar_line[0] * x2[0] + epipolar_line[1] * x2[1] + epipolar_line[2]) / np.sqrt(epipolar_line[0]**2 + epipolar_line[1]**2)
if abs(distance-min_distance) < ambig_param:
ambiguity = True
if distance < min_distance:
if abs(distance-min_distance) >= ambig_param:
ambiguity = False
min_distance = distance
best_match = l
if ambiguity:
return (0, [0, 0])
if min_distance < distance_threshold:
return (1, features_to_match[best_match])
return (1, [-1, -1])
def find_missing_correspondences(view_idx):
transformations = get_transformation_matrices(view_idx)
for i in range(num_images):
if i == view_idx:
continue
images_matched = True
for j in range(num_features):
if features_matched[i][j] == False:
images_matched = False
if images_matched:
continue
R = transformations[i][:3, :3]
t = transformations[i][:3, 3]
E = get_essential_matrix(R, t)
F = get_fundamental_matrix(E, K)
correspondences = []
matched = []
bad_view = False
for j in range(num_features):
feature = match_feature(feature_correspondence_matrix[view_idx][j], feature_data[i], F)
if feature[0] == 0 and feature[1][0] == -1:
bad_view = True
break
elif feature[0] == 0:
correspondences.append([-1, -1])
matched.append(False)
else:
correspondences.append(feature[1])
matched.append(True)
if bad_view:
continue
for j in range(num_features):
if features_matched[i][j] == False:
feature_correspondence_matrix[i][j] = correspondences[j]
features_matched[i][j] = matched[j]
done = True
for i in range(num_images):
find_missing_correspondences(i)
done = True
for k in range(num_images):
for j in range(num_features):
if features_matched[k][j] == False:
done = False
break
if done:
break
for i in range(len(feature_correspondence_matrix)):
print(feature_correspondence_matrix[i])
[array([163, 616], dtype=int32), array([431, 593], dtype=int32), array([380, 672], dtype=int32), array([164, 660], dtype=int32), array([378, 462], dtype=int32), array([280, 422], dtype=int32), array([274, 650], dtype=int32)] [array([183, 694], dtype=int32), array([495, 446], dtype=int32), array([495, 651], dtype=int32), array([192, 722], dtype=int32), array([398, 444], dtype=int32), array([282, 376], dtype=int32), array([346, 704], dtype=int32)] [array([157, 501], dtype=int32), array([384, 621], dtype=int32), [-1, -1], array([157, 540], dtype=int32), [-1, -1], array([291, 480], dtype=int32), array([252, 536], dtype=int32)] [array([199, 614], dtype=int32), [-1, -1], array([409, 690], dtype=int32), array([195, 658], dtype=int32), [-1, -1], array([285, 422], dtype=int32), array([307, 657], dtype=int32)] [array([168, 693], dtype=int32), array([473, 446], dtype=int32), array([474, 681], dtype=int32), array([157, 722], dtype=int32), [-1, -1], array([285, 374], dtype=int32), array([322, 722], dtype=int32)] [array([207, 500], dtype=int32), [-1, -1], [-1, -1], array([207, 539], dtype=int32), [-1, -1], array([285, 481], dtype=int32), array([303, 539], dtype=int32)] [array([177, 610], dtype=int32), array([448, 574], dtype=int32), array([386, 649], dtype=int32), array([187, 650], dtype=int32), array([383, 460], dtype=int32), array([313, 424], dtype=int32), array([284, 636], dtype=int32)] [[-1, -1], [-1, -1], [-1, -1], [-1, -1], array([505, 444], dtype=int32), array([444, 367], dtype=int32), [-1, -1]] [[-1, -1], array([473, 631], dtype=int32), array([350, 602], dtype=int32), [-1, -1], [-1, -1], array([472, 485], dtype=int32), array([259, 554], dtype=int32)]
Finally, see the result¶
for i in range(num_images):
img = cv2.imread(undistorted_images[i])
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
k = 0
for feature in feature_correspondence_matrix[i]:
if feature[0] != -1:
plt.plot(feature[1], feature[0], colors[k] + 'o', markersize=5)
k += 1
plt.title(f"Image {i}")
plt.tight_layout()
plt.show()
Triangulation¶
def get_projection_matrix(K, view_index):
T = get_transformation_matrices(0)[view_index]
P = K @ T[:3]
return P
#Calculate all 3d points with multiview triangulation
def multiview_triangulation(features_matrix, projection_matrices):
triangulated_points = []
for feature_index, feature in enumerate(features_matrix):
A = []
for view_index, point in enumerate(feature):
if np.array(point).sum() == -2:
continue
P = projection_matrices[view_index]
#since points are (h,w) .ie (y,x)
y, x = point
A.append(x * P[2, :] - P[0, :]) # Row from x-coordinate
A.append(y * P[2, :] - P[1, :]) # Row from y-coordinate
A = np.array(A) # Convert to NumPy array
# Solve A * X = 0 using SVD
_, _, Vt = np.linalg.svd(A)
X_homogeneous = Vt[-1] # Last row of Vt corresponds to smallest singular value
# Convert to Euclidean coordinates
X_euclidean = X_homogeneous[:-1] / X_homogeneous[-1]
triangulated_points.append(X_euclidean)
return triangulated_points
# Simulated 2D points from correspondence matrix and transposing shape from (9,7,2) to (7,9,2)
points_2d_all_features = np.array(feature_correspondence_matrix)
points_2d_all_features = np.transpose(points_2d_all_features, axes=(1, 0, 2))
# projection matrices for 9 views
projection_matrices = [get_projection_matrix(K, x) for x in range(0,9)]
# Perform multiview triangulation
points_3d = multiview_triangulation(points_2d_all_features, projection_matrices)
# Print results
for i, point in enumerate(points_3d):
print(f"Feature {i}: {point}")
Feature 0: [ 1.35201537 -0.86230485 4.81112929] Feature 1: [0.90839178 0.92869221 3.73654327] Feature 2: [1.63211365 0.744573 4.43802616] Feature 3: [ 1.60153403 -0.8015058 4.58633108] Feature 4: [0.1593758 0.72950708 4.45851857] Feature 5: [-0.11231227 0.03941065 3.93977091] Feature 6: [ 1.59977506e+00 -4.41454716e-03 4.80063840e+00]
#checking reprojection from 3d to 2d
for i in range(0, 7):
a = np.hstack([points_3d[i], [1]])
a = projection_matrices[0] @ a
y, x, _ = a/a[-1]
print(x, y)
162.21231786277394 616.8408014973934 431.40464923232537 592.9857748052221 380.57602276517207 671.4240079881464 165.02622389449078 659.7448171223541 377.96433206211844 462.4946720013839 281.29492490859474 422.06075682657115 274.4213245758087 649.7045239324775
# #Sparse reconstruction
# def sparse_reconstruction(points_3d):
# point_cloud = o3d.geometry.PointCloud()
# point_cloud.points = o3d.utility.Vector3dVector(points_3d)
# point_cloud, ind = point_cloud.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)
# return point_cloud
# sparse_cloud = sparse_reconstruction(points_3d)
# o3d.visualization.draw_geometries([sparse_cloud])
# import open3d as o3d
# def generate_sparse_point_cloud(points_3d):
# """
# Generate a sparse point cloud from triangulated 3D points.
# Args:
# points_3d (numpy.ndarray): Triangulated 3D points (N, 3).
# Returns:
# o3d.geometry.PointCloud: Sparse point cloud.
# """
# point_cloud = o3d.geometry.PointCloud()
# point_cloud.points = o3d.utility.Vector3dVector(points_3d)
# return point_cloud
# # Generate and visualize sparse point cloud
# sparse_cloud = generate_sparse_point_cloud(points_3d)
# o3d.visualization.draw_geometries([sparse_cloud])
One-view reconstruction¶
depth_maps = []
images = []
for i in range(len(rgb_paths)):
rgb_image = cv2.imread(undistorted_images[i])
depth_map = cv2.imread(depth_paths[i], cv2.IMREAD_GRAYSCALE)
depth_maps.append(depth_map)
images.append(rgb_image)
Relate depth values to Z-coordinate¶
def get_depth_for_features(feature_correspondence_matrix, depth_maps):
depth_values = []
for image_index, feature_points in enumerate(feature_correspondence_matrix):
image_depth_values = []
depth_map = depth_maps[image_index]
for point in feature_points:
if np.array_equal(point, [-1, -1]):
image_depth_values.append(None)
else:
u, v = point
depth_value = depth_map[u, v]
image_depth_values.append(depth_value)
depth_values.append(image_depth_values)
return depth_values
depth_values = get_depth_for_features(feature_correspondence_matrix, depth_maps)
for image_index, image_depth_values in enumerate(depth_values):
print(f"Image {image_index}:")
for feature_index, depth in enumerate(image_depth_values):
print(f" Feature {feature_index}: Depth (0-255 scale) = {depth}")
Image 0: Feature 0: Depth (0-255 scale) = 204 Feature 1: Depth (0-255 scale) = 160 Feature 2: Depth (0-255 scale) = 190 Feature 3: Depth (0-255 scale) = 193 Feature 4: Depth (0-255 scale) = 190 Feature 5: Depth (0-255 scale) = 167 Feature 6: Depth (0-255 scale) = 204 Image 1: Feature 0: Depth (0-255 scale) = 163 Feature 1: Depth (0-255 scale) = 150 Feature 2: Depth (0-255 scale) = 150 Feature 3: Depth (0-255 scale) = 147 Feature 4: Depth (0-255 scale) = 200 Feature 5: Depth (0-255 scale) = 186 Feature 6: Depth (0-255 scale) = 160 Image 2: Feature 0: Depth (0-255 scale) = 237 Feature 1: Depth (0-255 scale) = 196 Feature 2: Depth (0-255 scale) = None Feature 3: Depth (0-255 scale) = 237 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 167 Feature 6: Depth (0-255 scale) = 247 Image 3: Feature 0: Depth (0-255 scale) = 200 Feature 1: Depth (0-255 scale) = None Feature 2: Depth (0-255 scale) = 165 Feature 3: Depth (0-255 scale) = 194 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 155 Feature 6: Depth (0-255 scale) = 190 Image 4: Feature 0: Depth (0-255 scale) = 161 Feature 1: Depth (0-255 scale) = 130 Feature 2: Depth (0-255 scale) = 128 Feature 3: Depth (0-255 scale) = 151 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 175 Feature 6: Depth (0-255 scale) = 149 Image 5: Feature 0: Depth (0-255 scale) = 249 Feature 1: Depth (0-255 scale) = None Feature 2: Depth (0-255 scale) = None Feature 3: Depth (0-255 scale) = 249 Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 166 Feature 6: Depth (0-255 scale) = 249 Image 6: Feature 0: Depth (0-255 scale) = 208 Feature 1: Depth (0-255 scale) = 179 Feature 2: Depth (0-255 scale) = 206 Feature 3: Depth (0-255 scale) = 205 Feature 4: Depth (0-255 scale) = 207 Feature 5: Depth (0-255 scale) = 180 Feature 6: Depth (0-255 scale) = 214 Image 7: Feature 0: Depth (0-255 scale) = None Feature 1: Depth (0-255 scale) = None Feature 2: Depth (0-255 scale) = None Feature 3: Depth (0-255 scale) = None Feature 4: Depth (0-255 scale) = 234 Feature 5: Depth (0-255 scale) = 198 Feature 6: Depth (0-255 scale) = None Image 8: Feature 0: Depth (0-255 scale) = None Feature 1: Depth (0-255 scale) = 206 Feature 2: Depth (0-255 scale) = 242 Feature 3: Depth (0-255 scale) = None Feature 4: Depth (0-255 scale) = None Feature 5: Depth (0-255 scale) = 162 Feature 6: Depth (0-255 scale) = 231
def get_point_depth_in_view(point, view_idx):
depth_map = depth_maps[view_idx]
depth_value = depth_map[point[0], point[1]]
return depth_value
Generate pixel coordinate datasets for point cloud¶
dataset_height = range(1, 551, 2)
dataset_width = range(1, 881, 2)
dataset_coordinates = []
for i in range(len(dataset_height)):
for j in range(len(dataset_width)):
dataset_coordinates.append([dataset_height[i], dataset_width[j]])
#print(dataset_coordinates)
print(len(dataset_coordinates))
121000
def get_depths(points):
depths = []
for i in range(len(points)):
pixel_depth = get_point_depth_in_view(points[i], 0)
if pixel_depth == 255 or pixel_depth == 0:
depths.append(-1)
else:
depth = points_3d[0][2] * pixel_depth / get_point_depth_in_view(feature_correspondence_matrix[0][0], 0)
depths.append(depth)
return depths
dataset_z = get_depths(dataset_coordinates)
#print(dataset_z)
dataset_3d = []
for i in range(len(dataset_coordinates)):
z = dataset_z[i]
if (z == -1):
continue
x = (dataset_coordinates[i][0] - principal_point[1]) / focal_length_px * z
y = (dataset_coordinates[i][1] - principal_point[0]) / focal_length_px * z
dataset_3d.append([x, y, z])
#print(dataset_3d[:5])
Extract colours for the selected points¶
def get_point_colour_in_view(point, view_idx):
colour_value = images[0][point[0]][point[1]]
return colour_value
dataset_colours = []
for i in range(len(dataset_coordinates)):
z = dataset_z[i]
if (z == -1):
continue
colour = get_point_colour_in_view(dataset_coordinates[i], 0)
dataset_colours.append(colour)
Get mesh via utility function!¶
def ply_creator(input_3d, rgb_data, filename):
assert (input_3d.ndim == 2), "Pass 3d points as NumPointsX3 array "
text1 = """ply\nformat ascii 1.0"""
text2 = "element vertex " + str(input_3d.shape[0])
text3 = """property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
end_header"""
with open(filename, 'w') as ply_file:
ply_file.write(text1 + "\n")
ply_file.write(text2 + "\n")
ply_file.write(text3 + "\n")
for i in range(input_3d.shape[0]):
ply_file.write(f"{input_3d[i, 0]} {input_3d[i, 1]} {input_3d[i, 2]} {rgb_data[i][2]} {rgb_data[i][1]} {rgb_data[i][0]}\n")
dataset_3d = np.array(dataset_3d)
dataset_colours = np.array(dataset_colours)
ply_creator(dataset_3d, dataset_colours, 'sparse_reconstruction.ply')
Dense Reconstruction¶
Load Depth Maps, Calibration Data and Camera poses¶
# Load depth maps and images
depth_maps = []
images = []
for i in range(len(rgb_paths)):
rgb_image = cv2.imread(rgb_paths[i])
depth_map = cv2.imread(depth_paths[i], cv2.IMREAD_GRAYSCALE)
depth_maps.append(depth_map)
images.append(rgb_image)
# Confirm data loading
print(f"Loaded {len(depth_maps)} depth maps and {len(images)} RGB images.")
Loaded 9 depth maps and 9 RGB images.
depth_maps
[array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 117, 117, 117],
[255, 255, 255, ..., 117, 117, 117],
[255, 255, 255, ..., 117, 117, 117]], shape=(551, 881), dtype=uint8),
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[ 97, 97, 97, ..., 255, 255, 255],
[ 97, 97, 97, ..., 255, 255, 255],
[ 97, 97, 97, ..., 255, 255, 255]], shape=(551, 881), dtype=uint8)]
# Extract calibration parameters
calibration_data = np.load("data/camera_calibration.npy", allow_pickle=True).item()
# Construct the intrinsic matrix
focal_length_px = calibration_data["focal_length_px"]
principal_point = calibration_data["principal_point"]
intrinsics = np.array([
[focal_length_px, 0, principal_point[1]],
[0, focal_length_px, principal_point[0]],
[0, 0, 1]
])
print("Constructed Intrinsic Matrix:\n", intrinsics)
Constructed Intrinsic Matrix: [[629.28571429 0. 440. ] [ 0. 629.28571429 275. ] [ 0. 0. 1. ]]
# Load camera poses from file
poses = np.load("data/camera_movement.npy")
print("Camera Poses Shape:", poses.shape) # Should be (N, 4, 4) for N views
Camera Poses Shape: (8, 4, 4)
for i, pose in enumerate(poses):
print(f"Pose {i}:\n{pose}\n")
Pose 0: [[ 0.70710678 -0.1830127 0.6830127 -3. ] [ 0.1830127 0.98037987 0.0732233 -0.25881905] [-0.6830127 0.0732233 0.72672691 0.96592583] [ 0. 0. 0. 1. ]] Pose 1: [[ 6.12323400e-17 2.58819045e-01 -9.65925826e-01 4.00000000e+00] [-2.58819045e-01 9.33012702e-01 2.50000000e-01 -1.03527618e+00] [ 9.65925826e-01 2.50000000e-01 6.69872981e-02 3.86370331e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] Pose 2: [[ 7.07106781e-01 -1.83012702e-01 6.83012702e-01 -2.82842712e+00] [ 1.29893408e-16 9.65925826e-01 2.58819045e-01 -1.00000000e+00] [-7.07106781e-01 -1.83012702e-01 6.83012702e-01 1.41421356e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] Pose 3: [[ 7.07106781e-01 -1.29893408e-16 7.07106781e-01 -3.00000000e+00] [ 1.29893408e-16 1.00000000e+00 5.38036114e-17 -2.22044605e-16] [-7.07106781e-01 5.38036114e-17 7.07106781e-01 1.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] Pose 4: [[ 6.12323400e-17 1.83697020e-16 -1.00000000e+00 4.00000000e+00] [-1.83697020e-16 1.00000000e+00 1.83697020e-16 -7.77156117e-16] [ 1.00000000e+00 1.83697020e-16 6.12323400e-17 4.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] Pose 5: [[ 7.07106781e-01 -2.08398031e-16 7.07106781e-01 -2.82842712e+00] [ 2.98836239e-01 9.06307787e-01 -2.98836239e-01 1.21494310e+00] [-6.40856382e-01 4.22618262e-01 6.40856382e-01 2.12694929e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]] Pose 6: [[ 0.70710678 -0.29883624 0.64085638 -3. ] [ 0.5 0.85216551 -0.15431765 1.41421356] [-0.5 0.42954725 0.75199013 0. ] [ 0. 0. 0. 1. ]] Pose 7: [[ 6.12323400e-17 7.07106781e-01 -7.07106781e-01 2.00000000e+00] [-7.07106781e-01 5.00000000e-01 5.00000000e-01 -1.41421356e+00] [ 7.07106781e-01 5.00000000e-01 5.00000000e-01 1.41421356e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
# # Ensure each depth map is scaled to real-world units
# depth_maps_real = []
# for i, depth_map in enumerate(depth_maps):
# max_depth = np.max(depth_map) / 255.0 * 10.0 # Adjust scaling factor as needed
# depth_maps_real.append(depth_map / 255.0 * max_depth)
# depth_maps = np.array(depth_maps_real)
def calculate_depths(feature_points, depth_map, reference_feature, reference_depth):
"""
Calculate depths for 2D points based on the depth proportion from a reference feature.
Args:
feature_points (list of list): List of 2D points [u, v].
depth_map (numpy.ndarray): Depth map (H, W).
reference_feature (list): Reference feature [u, v].
reference_depth (float): Z-depth of the reference feature.
Returns:
list: Calculated depths for the feature points.
"""
depths = []
ref_depth_value = depth_map[reference_feature[0], reference_feature[1]]
for point in feature_points:
if point == [-1, -1]: # Ignore invalid points
depths.append(None)
else:
depth_value = depth_map[point[0], point[1]]
if depth_value == 255 or depth_value == 0: # Ignore invalid values
depths.append(None)
else:
depth = reference_depth * (depth_value / ref_depth_value)
depths.append(depth)
return depths
def generate_3d_points(points_2d, depths, intrinsics):
"""
Generate 3D points from 2D points and their depths.
Args:
points_2d (list of list): List of 2D points [u, v].
depths (list): List of calculated depths for each point.
intrinsics (numpy.ndarray): Camera intrinsic matrix (3x3).
Returns:
list: List of 3D points [x, y, z].
"""
fx, fy, cx, cy = intrinsics[0, 0], intrinsics[1, 1], intrinsics[0, 2], intrinsics[1, 2]
points_3d = []
for point, depth in zip(points_2d, depths):
if depth is None:
continue
u, v = point
x = (u - cx) * depth / fx
y = (v - cy) * depth / fy
points_3d.append([x, y, depth])
return points_3d
# Generate 2D points grid
dataset_height = range(1, 551, 5)
dataset_width = range(1, 881, 5)
points_2d = [[h, w] for h in dataset_height for w in dataset_width]
# Reference feature and depth
reference_feature = feature_correspondence_matrix[0][0] # First feature in the first view
reference_depth = points_3d[0][2] # Z-depth of the first triangulated feature
# Calculate depths
depths = calculate_depths(points_2d, depth_maps[0], reference_feature, reference_depth)
# Generate 3D points
points_3d_first_view = generate_3d_points(points_2d, depths, intrinsics)
sparse_cloud = o3d.geometry.PointCloud()
sparse_cloud.points = o3d.utility.Vector3dVector(points_3d_first_view)
# Visualize the sparse point cloud
o3d.visualization.draw_geometries([sparse_cloud])
def generate_dense_points(depth_map, rgb_image, intrinsics, pose, max_depth):
"""
Generate a dense point cloud from a single depth map and RGB image.
Args:
depth_map (numpy.ndarray): Depth map (H, W).
rgb_image (numpy.ndarray): RGB image (H, W, 3).
intrinsics (numpy.ndarray): Camera intrinsic matrix (3x3).
pose (numpy.ndarray): Camera pose matrix (4x4).
max_depth (float): Maximum depth value for the current depth map.
Returns:
numpy.ndarray: 3D points (N, 3).
numpy.ndarray: RGB colors (N, 3).
"""
h, w = depth_map.shape
fx, fy, cx, cy = intrinsics[0, 0], intrinsics[1, 1], intrinsics[0, 2], intrinsics[1, 2]
points = []
colors = []
for v in range(h):
for u in range(w):
z = depth_map[v, u] / 255.0 * max_depth # Scale depth to real-world units
if z > 0 and z < max_depth: # Ignore invalid or far points
# Back-project to camera coordinates
x = (u - cx) * z / fx
y = (v - cy) * z / fy
point_cam = np.array([x, y, z, 1])
# Transform to world coordinates
point_world = np.dot(pose, point_cam)
points.append(point_world[:3])
# Get color from RGB image
colors.append(rgb_image[v, u, :])
return np.array(points), np.array(colors)
# Generate partial dense point clouds
dense_points = []
dense_colors = []
for i in range(len(depth_maps)-1):
max_depth = np.max(depth_maps[i]) / 255.0 * 10.0 # Calculate max depth per depth map
points, colors = generate_dense_points(depth_maps[i], images[i], intrinsics, poses[i], max_depth)
dense_points.append(points)
dense_colors.append(colors)
# Combine all dense points into a single point cloud
final_dense_points = np.vstack(dense_points)
final_dense_colors = np.vstack(dense_colors)
# Create a dense point cloud
dense_cloud = o3d.geometry.PointCloud()
dense_cloud.points = o3d.utility.Vector3dVector(final_dense_points)
dense_cloud.colors = o3d.utility.Vector3dVector(final_dense_colors / 255.0) # Normalize colors
# Visualize the dense point cloud
o3d.visualization.draw_geometries([dense_cloud])
# Save the dense reconstruction
o3d.io.write_point_cloud("final_dense_reconstruction.ply", dense_cloud)
print("Final dense reconstruction saved as 'final_dense_reconstruction.ply'.")
Final dense reconstruction saved as 'final_dense_reconstruction.ply'.
Archive¶
for i, pose in enumerate(poses):
print(f"Camera Pose {i}:\n{pose}\n")
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[1], line 1 ----> 1 for i, pose in enumerate(poses): 2 print(f"Camera Pose {i}:\n{pose}\n") NameError: name 'poses' is not defined
for i, depth_map in enumerate(depth_maps):
print(f"Depth Map {i} - Min: {np.min(depth_map)}, Max: {np.max(depth_map)}")
Depth Map 0 - Min: 110, Max: 255 Depth Map 1 - Min: 112, Max: 255 Depth Map 2 - Min: 103, Max: 255 Depth Map 3 - Min: 88, Max: 255 Depth Map 4 - Min: 117, Max: 255 Depth Map 5 - Min: 114, Max: 255 Depth Map 6 - Min: 130, Max: 255 Depth Map 7 - Min: 114, Max: 255 Depth Map 8 - Min: 94, Max: 255
# Sample point in camera coordinates
sample_point = np.array([0.5, 0.5, 2.0, 1.0]) # Replace with real values
transformed_point = np.dot(poses[0], sample_point)
print(f"Transformed Point (World Coordinates): {transformed_point[:3]}")
Transformed Point (World Coordinates): [-1.37192756 0.46932385 2.11448494]
for i, (points, colors) in enumerate(zip(dense_points, dense_colors)):
cloud = o3d.geometry.PointCloud()
cloud.points = o3d.utility.Vector3dVector(points)
cloud.colors = o3d.utility.Vector3dVector(colors / 255.0)
o3d.visualization.draw_geometries([cloud], window_name=f"View {i}")
import numpy as np
import open3d as o3d
# Step 1: Normalize Depth Maps
def normalize_depth_maps(depth_maps):
depth_maps_real = []
for depth_map in depth_maps:
max_depth = np.max(depth_map) / 255.0 * 5.0 # Adjust scaling factor
depth_maps_real.append(depth_map / 255.0 * max_depth)
return depth_maps_real
depth_maps_real = normalize_depth_maps(depth_maps)
# Step 2: Generate Dense Point Clouds for All Views
def generate_dense_points(depth_map, rgb_image, intrinsics, pose):
h, w = depth_map.shape
fx, fy, cx, cy = intrinsics[0, 0], intrinsics[1, 1], intrinsics[0, 2], intrinsics[1, 2]
points = []
colors = []
for v in range(h):
for u in range(w):
z = depth_map[v, u]
if z > 0 and z < 10.0: # Ignore invalid or far points
x = (u - cx) * z / fx
y = (v - cy) * z / fy
point_cam = np.array([x, y, z, 1])
point_world = np.dot(pose, point_cam)
points.append(point_world[:3])
colors.append(rgb_image[v, u, :])
return np.array(points), np.array(colors)
dense_points = []
dense_colors = []
for i in range(len(depth_maps)-1):
points, colors = generate_dense_points(depth_maps_real[i], images[i], intrinsics, poses[i])
dense_points.append(points)
dense_colors.append(colors)
# Step 3: Merge and Align Partial Point Clouds
def merge_and_align_point_clouds(dense_points, dense_colors):
merged_cloud = None
for i, (points, colors) in enumerate(zip(dense_points, dense_colors)):
cloud = o3d.geometry.PointCloud()
cloud.points = o3d.utility.Vector3dVector(points)
cloud.colors = o3d.utility.Vector3dVector(colors / 255.0)
if merged_cloud is None:
merged_cloud = cloud
else:
icp_result = o3d.pipelines.registration.registration_icp(
cloud, merged_cloud, max_correspondence_distance=0.05,
estimation_method=o3d.pipelines.registration.TransformationEstimationPointToPoint()
)
cloud.transform(icp_result.transformation)
merged_cloud += cloud
return merged_cloud
# Merge and refine alignment
final_dense_cloud = merge_and_align_point_clouds(dense_points, dense_colors)
# Step 4: Visualize and Save the Final Dense Reconstruction
o3d.visualization.draw_geometries([final_dense_cloud])
o3d.io.write_point_cloud("refined_dense_reconstruction.ply", final_dense_cloud)
print("Refined dense reconstruction saved as 'refined_dense_reconstruction.ply'.")
Refined dense reconstruction saved as 'refined_dense_reconstruction.ply'.
Data¶
This should load all available data and also create some output directories. Feel free to rename variables or add additional directories as you see fit.
Integrate Depth Maps¶
def integrate_depth_maps(depth_maps, intrinsics, poses):
"""
Integrate depth maps into a dense point cloud.
Args:
depth_maps (list of numpy.ndarray): Depth maps for each view.
intrinsics (numpy.ndarray): Camera intrinsic matrix.
poses (list of numpy.ndarray): Camera pose matrices.
Returns:
numpy.ndarray: Dense 3D points (N, 3).
"""
dense_points = []
fx, fy, cx, cy = intrinsics[0, 0], intrinsics[1, 1], intrinsics[0, 2], intrinsics[1, 2]
for depth_map, pose in zip(depth_maps, poses):
h, w = depth_map.shape
for y in range(h):
for x in range(w):
z = depth_map[y, x]
if z > 0: # Ignore invalid depth values
X_cam = np.array([(x - cx) * z / fx, (y - cy) * z / fy, z, 1])
X_world = np.dot(pose, X_cam)[:3]
dense_points.append(X_world)
return np.array(dense_points)
# Integrate depth maps into a dense cloud
dense_points = integrate_depth_maps(depth_maps, intrinsics, poses)
# Combine sparse and dense points
combined_points = np.vstack([np.asarray(sparse_cloud.points), dense_points])
from scipy.spatial import Delaunay
def generate_mesh(points_3d):
"""
Generate a surface mesh from 3D points using Delaunay triangulation.
Args:
points_3d (numpy.ndarray): 3D points (N, 3).
Returns:
tuple: (vertices, faces) of the generated mesh.
"""
points_2d = points_3d[:, :2]
tri = Delaunay(points_2d)
vertices = points_3d
faces = tri.simplices
return vertices, faces
# Generate mesh
vertices, faces = generate_mesh(combined_points)
def save_mesh_to_ply(vertices, faces, filename="dense_reconstruction.ply"):
"""
Save a mesh to a PLY file.
Args:
vertices (numpy.ndarray): Mesh vertices (N, 3).
faces (numpy.ndarray): Mesh faces (M, 3).
filename (str): Output filename.
"""
with open(filename, "w") as f:
f.write("ply\n")
f.write("format ascii 1.0\n")
f.write(f"element vertex {len(vertices)}\n")
f.write("property float x\n")
f.write("property float y\n")
f.write("property float z\n")
f.write(f"element face {len(faces)}\n")
f.write("property list uchar int vertex_indices\n")
f.write("end_header\n")
for vertex in vertices:
f.write(f"{vertex[0]} {vertex[1]} {vertex[2]}\n")
for face in faces:
f.write(f"3 {face[0]} {face[1]} {face[2]}\n")
# Save the mesh
save_mesh_to_ply(vertices, faces, "dense_reconstruction.ply")
print("Dense reconstruction saved as 'dense_reconstruction.ply'.")
Dense reconstruction saved as 'dense_reconstruction.ply'.